Skip to content

Conversation

@shasson5
Copy link
Contributor

@shasson5 shasson5 commented Oct 5, 2025

Only last commit is relevant (other commits are merged from another branch)

What?

Enforce host memory support for mem_type EP

Why?

Prevent cuda_ipc from being selected for mem_type EP

Summary by CodeRabbit

Release Notes

  • Improvements

    • Optimized CUDA endpoint flushing with enhanced per-stream tracking and event management for improved efficiency.
    • Unified CUDA copy and CUDA IPC interfaces to use CUDA-optimized flush implementation for consistent behavior.
  • Configuration Changes

    • Removed ENABLE_SAME_PROCESS configuration option from CUDA IPC, simplifying deployment setup.

@shasson5 shasson5 added the WIP-DNM Work in progress / Do not review label Oct 5, 2025

if ((getpid() == *(pid_t*)params->iface_addr) && same_uuid &&
!iface->config.enable_same_process) {
if ((getpid() == *(pid_t*)params->iface_addr) && same_uuid) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we will always prevent cuda_ipc from same process. is that what we want? if not maybe:

if ((getpid() == *(pid_t*)params->iface_addr) && same_uuid) {
    return uct_iface_scope_is_reachable(tl_iface, params);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this block will be removed all together but as this PR is not yet ready, some things is not commited

@shasson5 shasson5 removed the WIP-DNM Work in progress / Do not review label Oct 22, 2025
@shasson5 shasson5 requested review from gleon99 and yosefe October 22, 2025 11:03
@shasson5 shasson5 added the WIP-DNM Work in progress / Do not review label Oct 27, 2025
@gleon99 gleon99 requested a review from amastbaum November 2, 2025 09:17
@coderabbitai
Copy link

coderabbitai bot commented Nov 12, 2025

Walkthrough

Introduces host memory-awareness for UCP memory-type endpoints, implements a per-endpoint CUDA flush mechanism with stream event tracking, removes the enable_same_process configuration option from CUDA IPC, updates endpoint flush bindings across CUDA transports, and adjusts tests accordingly.

Changes

Cohort / File(s) Summary
UCP endpoint memory-type constraints
src/ucp/core/ucp_ep.c
Adds host memory-awareness by intersecting host_mem_access_tls with mem_access_tls for memory-type endpoints and enforces a single-lane constraint (num_lanes == 1) for mem-type EPs through assertions.
CUDA base flush infrastructure
src/uct/cuda/base/cuda_iface.h, src/uct/cuda/base/cuda_iface.c
Introduces per-stream flush event tracking with uct_cuda_flush_desc_t and uct_cuda_flush_stream_desc_t types; adds uct_cuda_base_ep_flush function that enqueues flush events per stream, tracks completion via shared counter, and invokes completion callback on finish; adds dedicated flush_mpool and completion helper callback.
CUDA transport flush binding updates
src/uct/cuda/cuda_copy/cuda_copy_iface.c
Replaces generic base flush with CUDA-specific flush implementation in interface operations.
CUDA IPC configuration and flush
src/uct/cuda/cuda_ipc/cuda_ipc_iface.h, src/uct/cuda/cuda_ipc/cuda_ipc_iface.c
Removes enable_same_process field from iface config struct and config table; removes same-PID reachability shortcut logic; updates flush binding to CUDA-specific implementation.
Test updates
test/gtest/ucp/test_ucp_device.cc, test/gtest/ucp/test_ucp_peer_failure.cc, test/gtest/uct/test_uct_iface.cc
Removes UCX_CUDA_IPC_ENABLE_SAME_PROCESS environment variable from device test; adds cuda fragment size parameter to peer failure test; removes virtual is_self_reachable method and associated self-unreachable test class.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant EP as CUDA EP
    participant Flush as Flush Handler
    participant Streams as Active Streams
    participant Completion
    
    Caller->>EP: uct_cuda_base_ep_flush()
    alt No active streams
        EP-->>Caller: UCS_OK (immediate)
    else Active streams exist
        EP->>Flush: Allocate base flush descriptor
        loop For each active stream
            EP->>Streams: Allocate per-stream flush descriptor
            Streams->>Streams: Register callback on stream event queue
            EP->>EP: Increment shared stream_counter
        end
        EP-->>Caller: UCS_INPROGRESS
        par Stream Processing
            Streams->>Streams: Process stream events
            Streams->>Flush: Invoke completion callback when counter reaches zero
            Flush->>Completion: Call user completion
            Flush->>Flush: Free flush descriptor
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Attention areas:
    • CUDA flush mechanism logic in cuda_iface.c: verify event enqueuing, counter synchronization, and callback sequencing
    • Memory-type endpoint constraints in ucp_ep.c: confirm host memory access intersection logic and lane count assertions
    • Configuration removal impact: ensure enable_same_process removal doesn't break backward compatibility expectations
    • Test changes in test_uct_iface.cc: verify that removal of self_unreachable test class is intentional and doesn't mask regressions

Suggested reviewers

  • ofirfarjun7

Poem

🐰 A flush flows swift through streams aligned,
Per-EP events now well-designed,
Same-process flags we leave behind,
With memory-types and lanes refined,
The CUDA path now intertwined! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.14% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title focuses on CUDA_IPC and host memory support for mem_type EP, but the changeset includes substantial modifications across multiple CUDA transport layers, API removals, and test changes beyond the stated scope. Clarify whether the title should reflect the broader scope (e.g., 'CUDA transports: host memory enforcement and IPC configuration cleanup') or if the PR scope should be narrowed to focus primarily on mem_type EP host memory validation.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@shasson5 shasson5 removed the WIP-DNM Work in progress / Do not review label Nov 12, 2025
coderabbitai[bot]

This comment was marked as spam.

@openucx openucx deleted a comment from coderabbitai bot Nov 12, 2025
@openucx openucx deleted a comment from coderabbitai bot Nov 12, 2025
@openucx openucx deleted a comment from coderabbitai bot Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants